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I . INTRODUCTION 


In  light  of  the  increased  emphasis  on  analysis  and  modeling,  the 
need  to  extract  the  maximum  amount  of  information  from  difficult-to- 
augment,  scattered  data  bases,  and  the  growing  availability  of  power- 
ful computers,  it  is  not  surprising  that  data-fitting  has  become  an 
important  facet  of  Army  research  and  development.  Of  course,  the 
fitting  of  standard  functions  is  a well  developed  field,  and  "canned" 
programs  are  available  at  any  major  computing  facility.  Examples  of 
these  are  linear  regression  and  Fourier  analysis  programs.  However, 
the  standard  functions  are  linear  in  at  least  the  fitting  parameters, 
or  can  be  made  linear  by  a simple  transformation.  For  example,  a 
Fourier  expansion 
00 

Y - cos(a)t)  + sin((i)t)J 


is  linear  in  the  coefficients  A,  B,  and  (but  not  ui) ; and  the  function 

Y = Aj  exp  (- 


can  be  made  linear  by  taking  logrithms 


In  Y = C, 


Aj  x; 


Cj  i In  Aj. 


Unfortunately,  the  functional  forms  of  the  expressions  which  best 
reflect  an  analyst's  understanding  of  data-producing  phenomena  are 
often  not  linear.  For  such  cases,  "canned"  programs  are  usually  not 
available.  The  analyst  is  then  faced  with  three  alternatives:  he  can 
attempt  to  data-fit  by  hand  or  by  eye;  he  can  develop  his  own  "canned" 
program;  or  he  can  use  a "canned",  quasi-linear  iterative  routine. 

The  first  alternative  is  prohibitive  if  more  than  a few  data  points 
or  more  than  a few  parameters  are  involved.  The  second  alternative 
is  possible,  and  often  preferred,  if  very  large  amounts  of  data  are 
being  fit  by  a well-established,  unchanging  functional  form.  However, 
the  development  of  a fitting  program  is  time  consuming,  and  is  thus 
impractical  when  less  than  very  large  amounts  of  data  are  involved, 
or  when  the  functional  form  is  experimental  and  is  often  changed 
(e.g.  in  establishing  an  optimum  form). 

The  third  alternative  has  widespread  acceptance,  as  attested  to 
by  the  large  number  of  users  of  the  BRL  program  FNFIT.l  Since  its 
original  development  for  use  in  the  BRL  nuclear  dosimetry  program, 
FNFIT  has  been  extensively  used  in  laser  vulnerability,  laser  effects. 


^J.T.  Klopaio,  "FNFIT:  An  Eaay-to-Use,  Arbitrary  Funotion-to-Data 
Fitting  Routine  (A  Veer's  Manual),"  BRL-MR~2402.  (Aug  74) 

(AD  #A000655) 


smciDXiia  PMS  blamc-mot  numo 


radiation  shielding,  nuclear  pulse  spectral  analysis,  and  laser  beam 
diagnostics,  as  well  as  in  many  "one-shot"  laboratory  utility  applica- 
tions. Outside  users  have  included  the  U.S.  Forestry  Service,  U.S.  Army 
Chemical  Systems  Laboratory,  and  private  indtistry. 

As  originally  published,  however,  the  FNFIT  routine  had  shortcomings 
which  limited  its  usefulness.  First,  in  spite  of  the  title  of  reference 
1,  FNFIT  proved  not  to  be  "easy-to-use"  unless  the  user  had  initial 
instruction;  and  even  with  experience,  a manual  was  needed  to  recall 
data  order.  Secondly,  FNFIT  allowed  only  one  independent  variable. 

Hence  functions  such  as  the  Foiarier  analysis  shown  above,  in  which  the 
dependent  variable  (Y)  is  measured  as  a function  of  one  independent 
variable  (t) , could  be  fit.  However,  a general  expression  for  the 
time  required  for  a laser  to  penetrate  a plate,  which  is  a function 
of  such  independent  variables  as  plate  thickness,  laser  intensity  and 
power,  and  wind  velocity,  could  not  be  fit.  Shortcomings  also  existed 
in  the  available  outputs,  especially  the  incomp letemess  of  the  Calconp 
package,  and  in  the  available  documentation. 

In  view  of  the  widespread  usage  of  the  technique  and  the  iiiq)rovement 
in  local  computer  facilities,  it  was  decided  to  upgrade  the  FNFIT  program. 
The  result  is  GENFIT,  a fully  interactive  program  to  fit  arbitrary  func- 
tions of  multiple  parameters  and  independent  variables  to  data,  and 
plot  the  results  as  a function  of  any  independent  variable. 


II.  THEORY  OF  OPERATION 

The  theory  of  operation  of  GENFIT  is,  in  part,  identical  to  that 
of  FNFIT  (described  in  reference  1) . However,  differences  do  exist. 
Therefore,  a restatement  will  be  presented  here. 


For  statistical  reasons, 
statistic 


2 

X 


NDATA 

■s 


* 

wCDK  [y(i) 


NDm 

where  2-  W(l)  = 1. 
I»1 


the  goodness  of  fit  is  given  by  the 


- F(I)]f/ (NDATA  - NP) 


(1) 


& • • • 

Appltecf  Optimal  Eatimation,  ed.  A.  Gelh.  M.I.T.  Preae^  Cambridge^  MA, 
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P.R.  Bevington,  Data  Reduation  and  Error  Analueia  for  the  Phuaiaal 
Soienaee,  MoGraw  - Hill  Book  Co.,  N.Y. , N.Y.  (1969) 

A 

y.  Beere,  Intpoduatvon  to  the  Theory  of  Error,  Addison  - Wealey 
Publishing  Co.,  Reading,  MA.  (1957) 
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Here,  the  fit  is  to  NDATA  data  points,  each  point  being  of  the  form. 


Xj,  X2,  Xj,  . . . Xj^j,  Y,  W 

where  X^,  X-  . . . = independent  variables  (e.g.  pre-set  experimental 
conditions) 

NI  = nvunber  of  independent  variables 
Y = dependent  variable  (measured  result) 

W = weight  to  be  given  to  each  datum 

F(I)  = F[X  (I),  X (I).  . . A(l).  A(2),  . . . A(NPARAM)] 

is  the  value  of  tne  fitting  fmctions,  evaluated  at  the 
data  point 

A(J)  = the  parameters  of  the  fitting  function 

NPARAM  = number  of  parameters  [A(J)]  in  F and 

NP  = NPARAM  minus  the  number  of  parameters  held  constant 
(see  NHOLD,  below). 

It  is  worthwhile  to  emphasize  that,  although  the  'X's  may  have  been 
unknown,  measured  quantities  in  the  data-taking  process,  they  are  not 
the  unknowns  in  the  data  fitting.  The  unknowns  are  the  'A's,  the  para- 
meters of  the  function,  whereas  the  'X's,  'Y's,  and  'W's  are  known  data. 


X^  as 


The  problem  can  be  succintly  written  in  matrix  notation  by  writing 


= (Y  - F)"^  W (Y  - F) 


where  Y and  £ are  NDATA-dimensional  vectors , each  element  being  one 
data  point,  and  W a diagonal  matrix  whose  elements  are  the 
respective  weights  of  each  data  point  divided  by  (NDATA-NP) . 

(The  convention  adopted  here  is  to  define  a vector  as  an  N x 1 column 
matrix:  thus  the  defined  in  equation  2 is  the  same  scalar  as  defined 
in  equation  1.) 

The  desired  solution  of  the  problem  is  the  NPARAM-dimensional 
vector  A*  which  minimizes  x^-  From  the  calculus,  the  minimum  of  a 
scalar  with  respect  to  a vector  is  achieved  when 
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If  F were  linear  in  the  A's,  one  could  write 

linear  " £ (4) 

where  C is  an  NDATA  - dimensional  vector  independent  of  A,  and  H is 
an  NDATA^x  NPARAM  dimensional  matrix.^  The  solution  for  the 
vector  A which  satisfies  equation  3 is  straight  forward. 2 

i* linear  ' 1 (S) 

This  is  the  case  of  "canned"  programs  discussed  previously. 

The  method  used  in  GENFIT  is  to  force  the  problem  into  a linear 

(equation  4)  form.  This  is  accon^lished  by  expanding  F to  first  order 

in  a Taylor  series  about  a starting  value  A . ~ 

o 


F = F (A  ) + 
— — — o 


This  is  seen  to  be  of  the  form  of  equation  4 where 

C = F (A  ) + H'M 


Hi-  = 3F./M, 


AA  = A - A 
— — o 


Since  equation  3 is  equivalent  to 


3X  = 0 


the  solution  which  satisfies  equation  3'  is  given  by  equation  5 as 

^ « (H^W  H)'^  H^W  Y (8) 

where  H is  given  by  equation  7. 

The  GENFIT  algorithm  is  straight  forward.  An  initial  guess  for  A 
is  input  by  the  user.  GENFIT  takes  this  guess  as  A The  elements  of 
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H are  evaluated  by  fitting  F in  a fifth  order  polynomial  in  A at  each 
data  point  and  evaluating  the  derivative  in  closed  form.  The  initial 
guess  is  used  to  indicate  the  approximate  magnitude  of  the  elements  of 
A for  the  purpose  of  constructing  the  polynomial  fit.  Having  constructed 
the  elements  of  H,  ^ is  obtained  from  equation  8,  A from  equation  7, 
and  from  equation  2.  At  this  point,  ^ is  checked  to  see  if  at  least 
one  element  is  significant.  If  no  significant  element  remains,  A * A = 
A : a solution  has  been  found.  However,  because  of  the  linear  aj^prox- 
imations  involved,  ^ may  not  be  the  exact  solution  of  equation  3^ . 
Therefore,  if  M has  any  significant  element,  A ^ is  replaced  by 
A ■ A o + the  elements  of  H are  reconstructed,  and  M recalculated. 
T^iis  procedure  continues  until  M becomes  insignificant,  or  the  specified 
maximum  number  of  iterations  have  been  taken. 


Because  of  the  non-linear  nature  of  the  problem,  it  is  possible  for 
successive  values  of  M to  diverge  or  oscillate.  In  order  to  control 
the  search,  give  all  elements  of  A an  opportxjnity  to  adjust,  and  prevent 
elements  of  A from  passing  through  impossible  or  divergence -causing 
values,  GENFIT  includes  the  COOLIT  and  HOLD  options.  COOLIT  allows 
placing  upper  limits  on  the  elements  of  and  upper  and  lower  limits 

on  the  elements  of  A.  HOLD  allows  the  user  to  specify  a fixed  value 
for  any  element  in  A.  This  has  the  effect  of  lowering  the  A-dimensionality 
of  the  problem  from  NPARAM  to  NP,  as  defined  in  equation  1. 


In  certain  circum  cances,  the  values  of  M are  sure  to  diverge. 

As  an  aid  in  understanding  this  phenomena,  the  values  of  may  be 
considered  as  forming  a surface  in  an  NPARAM  + 1 dimensional  space,  in 
which  the  abscissae  are  the  elements  of  A,  and  the  ordinate  is  the 
corresponding  value  of  x^.  For  simplicity,  consider  an  NPARAM- dimensional 


cross-section  through  the  surface  parallel  to  the  Aj  axis,  as  shown  in 
Figure  la.  The  desired  value  of  A^  to  minimize  x^  is  shown  as  Aj.  The 
X^  minimization  algorithm,  however,  does  not  deal  with  the  x^  plot 
directly,  but  rather  with  3x^/5A.  The  cross-section  through  the 
corresponding  3x^/3A  surface  is~"shown  in  Figure  lb.  Effectively,  the 
algorithm  finds  the  slope  of  the  surface  at  the  starting  point 

A o»  ^nd  extrapolates  to  9 x^/^A  = 0.  As  can  be  seen  in  Figure  1,  this 
procedure  can  only  work  if  the  second  derivative  of  x^  at  the  point 
A^  is  positive  (x^  is  concave) . In  Figure  1 , this  condition  pertains 
in  the  interva^  Cj  £ A^  ±€2-  However,  if  the  starting  guess  were  the 
point  marked  A?,  the  extrapolation  toward  3x^/^A.  = 0 would  give  the 


* 1 fy,  , - - — — - - 

point  B in  Figure  Ib^as  the  next  guess  for  A^.  the  final  result  might 
be  "local  minim^"  A.',  or  might  be  further  divergence,  depending  iq}on 
the  slo^e  of  9x^/9A.  at  B.  There  is  little  chance,  however,  that  the 
point  A.  would  be  fiund. 


To  mollify  this  effect, 
GRID,  a modified  grid  search 


GENFIT  begins  »’ith  an  optional  call  to 
routine . 
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A grid  search  is  basically  a fairly  simple,  iterative  technique. 
Again,  an  initial  point,  A , is  used.  One-by-one,  the  elements  of 
A are  selected,  the  gradient  of  in  the  selected  dimension  is  calcu- 
lated, and  a series  of  steps  taken  toward  decreasing  in  the  selected 
dimension. 


The  modifications  to  the  basic  technique  which  were  developed  for 
GRID  deal  with  the  determination  of  step  size,  and  the  step  logic.  The 
initial  step  size  is  based  on  the  parameter  magnitudes  discussed  above. 
The  series  of  steps  is  taken  according  to  the  following  scheme; 

1)  a step  of  pre-selected  length  is  taken,  where  the  step  direction  is 
found  from  the  gradient  of  at  the  starting  point.  If  increases, 
the  search  returns  to  the  starting  point,  the  step  size  is  reduced,  2uid 
the  step  retaken.  This  procedure  is  repeated  until  the  x^  decreases 
(or  an  iteration  counter  reaches  its  maximum) . 2)  Upon  taking  a step 

that  decreases  x . the  gradient  at  the  new  point  is  compared  to  that  at 
the  starting  point:  A)  if  the  two  gradients  are  of  the  same  sign, 
the  minimxjm  has  not  yet  been  reached.  The  step  size  is  increased,  and 
the  algorithm  returns  to  step  1.  B)  If  the  two  gradients  are  of 
different  sign,  the  x^  minimum  lies  between  the  two  points.  A linear 
interpolation  is  done,  and  the  algorithm  moves  on  to  the  next  element 
of  A, 


Having  taken  the  series  of  steps  in  all  the  (active)  elements  of 
A,  the  output  subroutine  is  called  and  the  algorithm  is  repeated.  In 
GRID,  the  process  continues  until  no  element  of  A is  being  changed 
significantly,  or  until  the  user-dictated  maximum  number  of  steps  have 
been  taken.  Since  only  x'^  reducing  steps  are  taken,  GRID  exits  with 
its  lowest  x^.  and  returns  to  the  main  GENFIT  routine. 

For  purposes  of  illustration,  one  iteration  of  a GRID  search  in 
one  element-dimension  is  shown  in  Figui'e  Ic.  It  is  apparent  that  there 
is  a significantly  larger  interval  (D. .D^)  within  which  the  starting 
guess  can  fall  and  still  result  in  a convergeant  search.  Since  the 
GRID  search  operates  directly  on  the  x^  surface,  the  surface  needs  only 
to  slope  toward  A ; concavity  is  not  required. 

We  note  here  that  the  GRID  search  is  unique  to  GENFIT.  FNFIT 
used  a STEEPEST  GRADIENT  search,  which,  although  more  efficient  in 
some  cases,  was  too  undependable  for  general  use.  As  a final  observa- 
tion, notice,  in  Figure  l,^that  a starting  guess  to  the  right  of  D 
is  more  likely  to  end  at  A^ ' than  aJ,  regardless  of  the  search  tecnnique. 
For  the  common  case  in  which  the  shape  of  the  x^  surface  is  unknown 
and  there  is  no  a-priori  knowledge  upon  which  to  base  the  starting 
guess,  the  data  analyst  has  no  choice  but  to  "map  out"  the  x^  surface. 
Techniques  such  as  the  ravine  search^  have  been  developed  to  help 
systematize  such  a mapping;  however,  experience  and  knowledge  of  the 
system  remain  valuable  quantities. 
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III.  COMPUTER  QUERIES;  USER  RESPONSES 


Upon  execution  (eXQT  AMUCK*GENFIT.  FUNFIT)  the  user  is  welcomed, 
warned  against  using  file  (logical  unit)  2,  and  offered  an  example 
expression.  Then,  the  following  data  is  solicited: 

1.  EXPRESSION  - FORTRAN  statements  to  create  the  fitting  expression. 

Must  result  in  G = expression,  where  expression  involves  independent 
variables  [x(l),  X(2) , . . .]  and  parameters  [A(1) , A(2) , . . . ]. 

End  this  input  with  @EOF. 

2.  HEADING  - Up  to  10  lines  of  Hollerith  characters  to  be  reproduced 

at  the  top  of  each  output  page. 

3.  MODIFIED  GRID  ITERATIONS  - Number  of  iterations  of  the  GRID  search 

algorithm. 

4.  CHI-SQ  STEPS  - Number  of  iterations  of  the  (main)  quasi-linear 

routine . 

5.  OUTPUT  FORMAT  - 0 = output  only  at  end 

1 = parameters  and  each  iteration 

2 = acts  like  3 (below)  if  improves,  else  like  1 

3 = option  1 plus  full  data  output  on  every  iteration. 

6.  OOTPUT  ONTO  7?  - Unit  7 is  a designated  alternate  print  file. 

Program  output  can  be  to  any  file  designated  as  unit  7 for  subsequent 
examination  and  multiple  copy  printing.  Respond  YES  or  NO. 

7.  BINARY  FILE  FOR  CALCOMP  - Information  for  Calcomp  plots  (optional) 

is  written  on  a user  designated  file.  Responding  0 indicates  no 
plot  desired.  The  plot  option  can  be  turned  on  and  off  during  a 
run,  but  only  one  file  may  be  used.  See  section  V for  Calcomp 
instructions . 

8.  NUMBER  OF  INDEPENDENT  VARIABLES  - NI  as  defined  in  equation  1. 

9.  OUTPUT  VS.  WHICH  VARIABLE  - Orders  and  lists  data  points  by  chosen 

variable.  Also  computes  derivative  of  function  with  respect  to 
chosen  variable  for  output  option  4. 

10.  NUMBER  OF  PARAMETERS  (’A's)  - NPARAM  as  defined  below  equation  1. 

11.  PARAMETERS  HELD  CONSTANT.’  - Although  appearing  as  fitting  parameters 

in  the  user  express ion7  the  user  may  designate  any  parameters  to 
be  given  fixed  values  for  a given  fit.  Subsequent  fits  may  release 
or  change  these  parameters  (HOLD  option  as  discussed  in  section  II). 

IIA.  SUBSCRIPTS  - If  response  to  11  is  not  0,  the  subscripts  of  the  held 
parameters  are  solicited. 


12 


12.  STARTING  VALUES  ( , MAQ'llTUDES)  - Initial  values  for  the  parajneters, 

used  to  start  the  iterative  routines.  The  magnitudes  are  used  to 
derive  initial  step-size  GRID,  and  to  generate  points  for  the 
pol/noninal  approximations  in  the  derivative  routines.  If  the 
starting  value  is  non-zero,  anission  of  the  magnitude  results  in 
the  starting  value  being  taken  as  the  magnitude.  In  a well  deter- 
mined problem,  the  results  are  fairly  insensitive  to  this  entry. 
See  Section  VI. 

13.  SEAROl  CONTROLS  - Are  controls  on  the  parameters  desired? 

13A.  CONTROLS  - If  the  answer  to  13  is  YES , control  specifications  are 
solicited  in  three  categories:  (TJ“  control  of  change  size  during 
any  single  iteration,  (2)  absolute  minimun  allowed  value,  and 
(3)  absolute  maximum  allowed  value  for  specified  parameters. 

2 

1.  Change  size  - (x  minimization  only.)  If  solution  of  the 
quasi -linear  equatiwi  3'  results  in  elements  of  M exceeding 
specified  boiaids,  the  elements  are  reset  to  the  specified 
bounds.  Special  feature  - typing  ALL  bounds  changes  in  all 
parameters  to  O.S*  MAQJITUDE. 

2 

2.  Minimum  - (GRID  and  x minimization.)  If  the  new  A (■  A + AA) 
has  elements  that  are  lower  than  specified  minimum  boun^,  tH? 
elements  are  reset  to  the  specified  bounds.  Special  feature- 
Bounds  can  be  set  relative  to  the  current  value  of  another 
parameter.  Hence,  if  A(5)  is  always  to  be  0.4  greater  than 
A(4) , the  response  S,  0.4,  A(4)  will  use  the  current  value 

of  A(4)  in  assessing  the  bound  on  A{5) . 

3.  Maximum  - Works  like  (2)  above  on  elements  of  A that  exceed 
specified  maxima.  Special  feature  - as  in  (2),  relative  boimds 
may  be  specified.  Note  that  the  lower  bound,  (2),  is  always 
checked  first.  Hence,  if  two  relative  parameters  are  too 
close  together,  the  one  with  the  relative  mimimum  will  be 
increased.  In  order  to  "share"  the  change,  the  relative 
minimum  boimd  on  the  larger  parameter  should  be  one  half  of 
the  desired  difference,  and  a relative  maximum  equal  to  the 
whole  difference  should  be  inpressed  on  the  smaller  parameter. 
Thus,  if  A(4)  is  to  be  0.5  less  than  A(5) , and  it  is  desired 

to  control  both,  the  proper  sequence  is  to  respond  5,  0.25, 

A(4)  to  the  request  for  minimum  bounds,  and  4,  0.5,  A(5)  to 
the  request  for  maxima.  Then  if  an  iteration  resulted  in 
A(4)  = 1.6,  and  A(5)  = 1.7.  the  minimum  bound  would  first  change 
A(5)  to  1.6  + 0.25  = 1.85;  and  following  that,  the  maximum 
bound  would  change  A(4)  to  1.85  - 0.50  * 1.35. 

"NOTE  ON  THE  CONTROLS":  Since  the  new  values  of  all  parameters  (A) 
in  the  CHI-SO  minimization  search  are  interdependent,  the 
"artificial"  alteration  of  A or  AA  by  one  of  the  controls  could 
introduce  instabilities  in  the  search.  In  order  to  avoid  such 
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instabilities,  any  parameters  which  are  artificially  changed  are 
placed  on  a temporary  hold  stack  and  the  dimensionality  of  the 
problem  reduced  for  the  following  iteration  as  in  item  11 
above.  If  a following  iteration  results  in  more  parameters 
being  artificially  changed,  those  parameters  are  added  to  the 
stack.  When  an  iteration  occurs  in  which  no  parameter  is 
artificially  changed,  one  parameter  is  released  from  the  stack 
on  a first-on-first-off  basis.  In  this  way,  instabilities  are 
damped  out  by  adjustments  in  the  stable  parameters.  For  the 
convenience  of  the  analyst,  any  parameter  which  is  held  is 
shown  with  an  * after  its  value  in  the  output  for  output 
options  greater  than  0. 

14.  DATA  WEIGHTING  - Data  points  can  be  assigned  different  weights 

(the  diagonal  matrix  W in  equation  1),  Thre^  options  are  available: 

r.  Data  can  be  weighted  as  a power  of  Y (the  dependent  variable) . 
Hence,  responding  will  weight  each  data  point  inversely 
as  the  magnitude  of  Y (Data  points  with  large  Y values  are 
discounted  relative  to  small  Y) . For  many  processes  in  which 
the  relative  standard  deviation  improves  as  the  square  root  of 
the  number  of  counts,  responding  0.5  properly  weights  such  data. 

2.  As  a corollary,  the  response  0 weights  all  data  equally. 

3.  Each  datum  can  include  its  own  relative  weight.  The  DATA 
INPUT  (item  15,  below)  allows  for  such  input.  It  should  be 
noted  that  the  action  of  options  (1)  and  (2)  is  to  ignore  any 
prior  weights,  and  reweight  all  data.  To  avoid  this  action 
on  re-runs,  option  3 must  be  used.  Further,  all  weights  are 
relative.  After  input,  all  weights  are  normalized  such  that 
r W(I)  = 1 , as  specified  following  equation  1. 

15.  DATA  INPUT  - .Tie  data  is  input  in  free  field  format,  one  datum 

per  record.  The  order  is 

Xj.  Xj.  Xj.  . . . X„,,  Y,  <1 

where  the  Xi  correspond  to  the  independent  variables  in  the 

EXPRESSION  (item  1,  above),  and  W,  the  weight  for  each  datum. 

Is  read  only  if  option  (3)  is  selected  in  item  14  above.  The 

final  data  record  must  be: 

eEOF 

This  end-of-file  mark  initiates  execution  of  the  fit. 
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IV.  SUBSEQUENT  FITS 

After  completing  a fit,  control  of  the  program  returns  to  the 
user.  Since  the  EXPRESSION  has  been  compiled  into  the  GENFIT  program 
(available  as  an  absolute  element  called  FIT  in  TPF$.),  it  is  efficient 
to  repeat  the  fitting  process  at  this  time  if  desired.  GENFIT  asks  if 
another  fit  is  requested.  If  the  response  is  YES,  GENFIT  allows  selective 
changes  of  the  data  input  under  any  of  the  items  of  section  III  except 
item  1,  EXPRESSION.  Old  data  can  be  left  in,  augmented  or  rei^laced  as 
desired.  Reweighting  of  data  is  accomplished  as  described  in  section 
III,  item  14.  Initial  values  for  the  parameters  are  automatically 
updated  to  the  BEST  FIT  values  of  the  previous  fit;  however,  the  initial 
magnitudes  remain.  Both  of  these  can  be  changed  by  the  user. 


V.  CALC(M>  PLOTS  OF  FITS 

If  a non-zero  response  is  given  to  item  7 in  section  III,  selected 
data  points  and  a smooth  curve  of  EXPRESSION  values  encompassing  the 
data  points  are  written  on  a binary  file  by  GENFIT.  The  following 
solicitations  are  involved  in  writing  the  file. 

1.  INDEPENDENT  VARIABLE  FOR  PLOT  ABSCISSA  - Since  the  plot  produced 

is  2-dimensional  (Y  vs . one  X)  the  subscript  of  the  desired  X 
must  be  input. 

2.  CALCOMP  SYMBOL  - An  integer  (0  through  127)  indicating  the  system 

defined  plotter  symbol  used  for  data  points  on  the  plot.  If  the 
integer  is  preceded  by  a minus  sign,  the  plot  program  will  not 
create  a new  set  of  axes  for  the  current  plot.  Rather,  the 
current  plot  will  be  combined  with  the  preceding  plot(s),  jointly 
scaled,  and  superimposed  onto  one  plot. 

3.  VALUE  FOR  OTHER  INDEPENDENT  VARIABLES  - Since  the  plot  corresponds 

to  a slice  in  the  (item  1 - chosen)  direction  through  a multi- 
dimensional function  space,  fixed  values  for  the  other  independent 
variables  must  be  specified.  Thus,  in  the  example  of  lasers 
penetrating  plates,  the  dependent  variable  (penetration  time)  is 
plotted  versus  the  chosen  independent  variable(e.g.  laser  intensity) 
while  the  other  independent  variables  (plate  thickness,  total  power, 
airflow  velocity)  are  held  at  the  values  specified  by  this  input. 
This  input  also  solicits  the  ranges  within  which  data  points  must 
fall  to  be  included  on  the  plot.  Hence,  if  the  laser  penetration 
time  is  to  be  plotted  versus  intensity  for  L cm.  plates,  the 
response  1.,  0.9,  1.2  would  evaluate  and  plot  the  expression  for 
a thickness  of  1.,  and  would  place  on  the  curve  those  data  points 
for  which  the  thickness  fell  between  0.9  and  1.2. 

The  program  responds  to  the  above  inputs  by  telling  the  user  how 
many  data  points  fall  within  the  specified  ranges.  If  at  least  one 
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datum  qualifies,  the  user  is  asked  if  he  wishes  to  have  the  plot  infor- 
mation written  on  his  previously  specified  binary  file.  If  response  is 
YK,  the  information  is  blocked  and  written. 

Whether  the  above  response  is  YES  or  NO,  the  program  asks  if  more 
plots  are  desired.  If  this  response  is  YES,  the  program  returns  to  item 
1 of  this  section.  Thus,  the  user  can  generate  several  plots  of  a given 
fit,  overlaying  whichever  he  wishes,  selecting  various  independent  var- 
iables as  abscissae,  and  fixing  different  values  for  the  non-selected 
iiidependent  variables. 

Upon  termination  of  the  fitting  program,  the  plot  information 
binary  file  is  closed.  To  create  a calcomp  plot  tape  from  this  file, 
the  user  must  execute  the  plot  program. 

8XQT  AMUCK*GENFIT.  CALCOMP 

The  CALCOMP  program  sends  the  following  queries: 

A.  INPUT  UNp  AND  OITTPUT  UNIT  - The  INPUT  UNIT  is  the  plot  infor- 
mation  binary  file  described  above.  The  OUTPUT  UNIT  is  the 
Calcomp  tape. 

B.  IS  SCALE  LIMIT  INPUT  DESIRED?  - If  answered  the  program 
will  ask  for  XMIN  and  XMAX,  the  desired  extent  of  the  plot 
abscissa,  and  YMIN  and  YMAX,  the  desired  extent  of  the  plot 
ordinate,  before  each  plot. 

C.  TITLES  FOR  GRAPHS  - For  each  new  set  of  axes  (see  item  2 of 
this  section)  a TITLE  of  less  than  73  characters  is  solicited. 

If  no  (further)  titles  are  desired,  typing  SEOF  suppresses  this 
solicitation. 

D.  )MIN,  XMAX,  YMIN,  YMAX,  OR  8E0F  - If  the  answer  to  B.  was  YES, 
the  program  asks  for  the  desired  axes  extrema.  However,  the” 
current  plot  can  be  scaled  as  if  B were  answered  NO  by  now 
answering  8E0F. 

The  CALCOMP  program  ends  with  a report  of  how  many  data  sets  were 
drawn  on  how  many  plots  (sets  of  axes) . The  Calcomp  tape  is  closed, 
and  may  be  plotted  in  actox-u  with  system  standard  operating  procedure. 


VI.  A GENFIT  EXAMPLE 

Situation:  An  unknown,  very  strong  radioactive  source  is  discov- 
ered at  a storage  site  being  decommissioned.  A monitor  (monitor  A) 
is  set  up  at  a safe  distance,  and  the  US  Army  RADCON  team  is  called  in 
to  perform  a detailed  survey.  Two  hours  later,  the  RADCON  team  advance 
party  has  already  arrived  and  set  up  a monitor  (monitor  B)  closer  than 
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monitor  A.  An  hour  later,  the  full  team  is  present,  and  a series  of 
measurements  are  taken  to  within  the  500  R/hr.  level. 


It  is  hypothesized  that  the  package  contains  two  rather  short- 
lived isotopes.  The  meter  readings  are  calibrated,  and  the  data  fit 
to  a curve  of  the  form 

RATE  -[ij  * exp  (-  * t)  ♦ I2  * exp(-  X2  * t)] /R^  (9) 

The  data  are  given  in  Table  I. 

Looking  at  equation  9,  a few  facts  are  easily  noted.  All  the 
parameters  must  remain  greater  than  0.  However,  even  with  that  restric- 
tion, the  value  of  the  function  could  be  very  sensitive  to  small  changes 
in  X.,  and  X2.  A more  subtle  difficulty  could  be  the  following:  the 
two  terms  in  the  numerator,  subscripted  , and  appear  identical  to 
GENFIT,  as  they  stand.  It  is  possible  that  the  fit  will  not  be  able 
to  decide  which  of  the  two  describes  the  shorter-lived  isotope  and  will 
fall  into  an  oscillation  between  two  identical  minima  in  the  surface. 

The  COOLIT  option  provides  a mechanism  for  alleviating  the  above 
problems.  As  listed  by  GENFIT  on  the  top  of  the  output,  non-negative 
minima  and  change  limits  were  applied  to  both  A(3)  and  A(4) . Further- 
more, A(3)  was  chosen  to  be  the  time  constant  of  the  short-lived  isotope; 
therefore,  the  minimum  of  A(3)  was  set  relative  to  A(4) . TTiis  provides 
the  needed  asymmetry  between  the  two  terms. 

The  results  of  the  fitting  runs  are  given  in  Table  II.  The 
different  sets  of  results  were  the  result  of  changing  the  initial 
conditions,  data  weights,  and  parameter  controls.  The  first  set  was 
run  with  A(2)  held  at  the  value  8.0E3,  which  was  the  value  used  to 
generate  the  data.  (Note:  The  data  for  this  example  was  generated 
using  the  values  listed  in  Table  III  and  introducing  random  round-off 
errors . ) 

The  second  set  of  results  used  the  results  of  set  #1  as  starting 
values.  The  lower  CHI-SQ  resulted  from  having  more  independent  param- 
eters. The  third  set  of  results  came  from  weighting  all  data  evenly, 
instead  of  a 10:1  weighting  toward  the  Radeon  data  in  sets  1 and  2. 

Figures  2-5  are  plots  of  the  fit  of  answer  set  #1.  However,  the 
plots  of  the  other  answer  sets  are  identical  in  this  range.  Notice 
that,  although  actually  exponential,  the  range  of  the  independent 
variable  time  [X(l)3  is  so  short  relative  to  the  half-lives  that  the 
curve  appears  linear. 

Situation  Continues:  After  the  initial  RADCON  survey,  it  is 
decided  to  transfer  the  unknown  source  to  a private  laboratory  for 
positive  identification.  However,  because  of  the  time  required  for 
administrative  functions,  it  is  six  months  to  the  minute  when  the 
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TABLE  I.  DATA  FOR  EXAMPLE  PROBLEM 
MONITOR  A (DISTANCE  lOOm)  MONITOR  B (DISTANCE  10m) 


TIME  Chr.l 

RATE  Gets) 

RATE  fctsl 

0.0 

1.80 

O.S 

1.79 

1.0 

1.79 

1.5 

1.78 

2.0 

1.78 

177.8443 

2.5 

1.77 

177.3127 

3.0 

1.77 

176.7840 

3.5 

1.76 

176.2582 

4.0 

1.76 

175.7352 

SURVEY 

TIME  - 3.0  hr 

DISTANCE  (m.) 

RATE  Gets! 

100. 

1 . 7678 

50. 

7.0714 

25. 

28.2854 

15. 

78.5707 

12. 

122.7667 

10. 

176.7840 

8. 

276.2250 

7. 

360.7837 

6. 

491.0667 

5.5 

584.4099 
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TABLE  II.  FITTING  PARAMETCRS  FROM  TABLE  I DATA 


UATA 

SET 

2 

X 

A(l) 

AC2) 

A(3) 

A(4) 

1 

2.602-6 

1.0+4 

8.0+3 

1.088-2 

1.341-5 

2 

2.S11-6 

1.0+4 

8.0+3 

1.088-2 

1.345-5 

3 

1.661-5 

1.0+4 

8.0+3 

1.088-2 

1.353-5 

4 

2.675-6 

5.809+3 

1.219+4 

1.394-2 

2.296-3 

5 

8.384-5 

3.807+3 

1.420+4 

2.973-2 

1.329-4 

6 

1.675-5 

4.414+3 

1.359+4 

1.536-2 

3.022-3 

7 

3.171-7 

6.005+3 

1.2+4 

1.356-2 

2.289-3 

8 

5.876-7 

7.791+3 

1.021+4 

1.228-2 

1.304-3 

9 

4.311-7 

9.206+3 

8.794+3 

1.092-2 

9.278-4 

TABLE  III. 

TOE  GENERATOR  SET 

A(l) 

A(2) 

A(3)  A(4) 

1.0+4 

8.0+3 

1.0896-2  7.1209-8 
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Figure  2.  Count  Rate  Versus  Time  for  Monitor  A (squares)  and  Radeon 
Monitor  (octagons). 
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private  laboratory,  using  properly  calibrated  equipment,  repeats  the 
RAOCON  measurement.  Unfortunately,  plans  to  open  the  source  are 
thwarted  by  a cost  over-run.  It  is  exactly  one  year  from  the  first 
measurement  that  an  extension  is  approved,  and  a second  laboratory 
measurement  is  made.  These  final  2 data  points  are  listed  in  Table  IV, 
and  the  fit  that  results  by  including  them  with  the  Table  I data  are 
shown  in  Table  V and  Figure  6. 

It  is  illustrative  to  note  that  the  final  fit  is  now  reasonably 
independent  of  the  starting  guess.  In  analyzing  the  problem,  the 
reason  becomes  clear.  The  CHI-SQ  surface  created  by  equation  9 and  the 
data  in  Table  I is  an  extremely  complex  one,  having  a large  number  of 
local  minima,  especially  in  the  region  around  the  generator  set  (Table 
III).  The  large  number  of  "good"  answers  results  from  the  inability 
of  the  data  to  determine  the  parameter  values.  Physically,  this 
resulted  from  trying  to  determine  half-lives  in  the  order  of  days  and 
years  by  making  noisy  measurements  over  a few  hours.  Such  an  experiment 
is  a poorly  determined  one,  which  is  reflected  in  the  fitting  process 
(the  shape  of  the  (3II-SQ  surface) . When  the  6 and  12  month  points  are 
added,  however,  the  experiment  becomes  a better  one  and  the  data  does 
span  sufficient  range  to  overcome  the  noise  in  the  data.  The  addition 
of  the  two  points  sizably  reshapes  the  CHI-SQ  surface,  and  reasonable 
searches  are  successful. 

This  is  clearly  seen  by  comparing  Figures  4 and  6,  the  plots  of 
Y vs.  X(l)  with  and  without  the  two  late  data  points.  Without  the  two 
points,  the  fits  are  practically  straight  lines  within  the  data  scatter. 
However,  with  the  two  late  points,  the  data  are  extended  over  a range 
in  which  the  curvature  is  pronounced. 

As  a final  exercise,  a number  of  very  bad  initial  guesses  were 
taken,  including  starts  with  zero  intensity  for  the  short-lived  isotope. 
Although  requiring  more  GRID  search  steps,  these  bad  starts  did  not 
prevent  GENFIT  foim  finding  the  generator  set  as  long  as  the  6 and  12 
month  points  were  included. 


VII.  SUMMARY 

The  GENFIT  code  is  an  extremely  versatile  multi -variable, 
multi-parameter  function  to  data  fitting  routine.  Because  of  its 
simplified,  completely  interactive  input,  and  many  defaults,  the 
program  is  also  easy-to-use.  Initial  guesses  are  required;  however, 
the  search  is  sensitive  to  these  guesses  only  if  the  data-producing 
experiment  was  ill-conditioned  to  the  parameters  being  determined. 

The  available  output  options  include  an  easy-to-use  plotting 
package  which  yields  graphical  comparisons  between  function  and  data. 
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TABLE 

TIME  Chr.) 
4320. 
8760. 


TABLE  V. 


IV.  6 MONTH  AND  12  MONTH  DATA 
Distance  = 1 . m 

RATE  (cts) 

7997.5394 

7995.0112 


FITTING  PARAMETERS  FROM  ALL  DATA 

A(4)* 


2 


A(l)* 


A(2 
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A(3)* 


V9’89l 


T6*oei 


8T’86  Sfr-99 

( nj*  ( I )A 


ez.*2e 


00*  (P 


•W  01  Oi  031038  bibO  3ib1  *•  3WIi  ’SA  3100 


26 


L 


REFERENCES 


1.  J.T.  Klopcic,  "FNFIT:  An  Easy-to-Use,  Arbitrary  Function-to-Data 

User's  Manual),"  BRL-MR-2402.  (Aug  74) 

Estimation,  ed.  A.  Gelb.  M.I.T.  Press,  Cambridge, 


3.  P.R.  Bevington,  Data  Reduction  and  Error  Analysis  for  the  Physical 
Sciences . McGraw  - Hill  Book  Co'.,  N.Y.,  N.Y.  (196^) ^ 


4. 


Y.  Beers,  Introduction  to  the  Theory  of  Error. 
Publishing  Co.,  Reading,  MA.  fl957) ’ 


Addison  - Wesley 


I 


I 

1 

f 

i 


27 


k 


APPENDIX  A 


EXAMPLE  RESPONSES  AND  OUTPUT 

Figure  A-1  of  this  appendix  contains  the  terminal  I/O  for  one 
of  the  runs  made  during  construction  of  the  example  problem  (section  VI, 
Min  body  of  this  report).  The  responses  (from  terminal)  are  indicated 
by  arrovrs.  (The  data  had  previously  been  placed  in  file  GENFITDATA.) 
Figure  A- 2 is  the  output  which  was  redirected  to  the  alternate  print 
file  (7)  for  ease  of  publication. 
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